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ABSTRACT 

Sky coverage is one of the most important pieces of information about astronomical observations. We 
discuss possible representations, and present algorithms to create and manipulate shapes consisting 
of generalized spherical polygons with arbitrary complexity and size on the celestial sphere. This 
shape specification integrates well with our Hierarchical Triangular Mesh indexing toolbox, whose 
performance and capabilities are enhanced by the advanced features presented here. Our portable 
implementation of the relevant spherical geometry routines comes with wrapper functions for database 
queries, which are currently being used within several scientific catalog archives including the Sloan 
Digital Sky Survey, the Galaxy Evolution Explorer and the Hubble Legacy Archive projects as well 
as the Footprint Service of the Virtual Observatory. 
Subject headings: astrometry — catalogs 
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L INTRODUCTION 

Astronomers have to keep accurate records of where 
their observations are located on the sky. Beyond the 
direction and angular size of the field of view, we have 
detailed information available about the precise sky cov- 
erage derived from the exact shape of our detectors. This 
coverage is invaluable for most statistical studies, e.g., lu- 
minosity functions or especially analyses of spatial clus- 
tering. 

The de facto standard of the Fl exible Image Transport 
System (FITS ; I Wel ls et al.||1981[) has reserved keywords 
for the W orld Coordinate System (WCS; Greisen fc Cal-| 
abretta|[2QQ2j specification, and parameters that specify 
the transtormation from image pixels to sky coordinates 
(and reverse), are present in the header of most FITS 
files. While the WCS is perfectly adequate for individual 
exposures, multiple observations are difficult to describe 
in a single system. Every field has potentially a sepa- 
rate coordinate system, hence moving from field to field 
is convoluted, and makes it cumbersome to answer even 
simple questions, e.g., whether two separate fields over- 
lap. 

The footprint of a large-area survey might be com- 
plicated but the small-scale irregularities are even more 
problematic. Not only the depth of a survey changes as 
a function of the position on the sky, but parts of the 
observations are often censored for various reasons, such 
as bright stars blocking the view, satellite trails, artifacts 
from refiections. If we would like to represent all these on 
the sky, we need a scalable solution that works for shapes 
of arbitrary complexity and size from the subpixel level 
to the entire sky. We need tools to create and manip- 
ulate these descriptions right there where the data are 
and utilize the information efficiently. 

Geographic Information Systems (CIS) were designed 
with a similar goal in mind. There are however subtle 
differences, that are large enough that CIS systems are 
not quite applicable to astronomy directly. The modern 
mapping systems do not extend much beyond the basic 
features of projected maps but provide very efficient tools 
for finding nearby places and shortest routes, etc. Even 



the most complicated CIS shapes are limited to spherical 
polygons whose sides are great circles (or straight lines 
in the projection.) In astronomy, the approximations 
with such polygons would be unacceptably inaccurate 
or prohibitively redundant, hence there is need for an 
extended set of features to represent the geometries of 
the observations and surveys. 

Some of the concepts discussed in the paper are not 
new and have been introduc ed and studied pre viously 
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([1990); Fek ete] ( |199Q1 ); [Short et"aL] (1995) developed an' 
icosahedron- based methodology for Earth sciences, and 
Kunsz t et al.^ (2000^ 2QQ1J introduced the convex descrip- 
tion and the Hierarchical Triangular Mesh, a tr ian^ula- 
tion that was also used by |Lee fc Samet (11998^ with a 
different numbering sch eme. A similar repre sentation of 
shapes is also found in [Hamilton & Tegmark (2004 1) and 
Swanson et al.| (|2008D. Goodchild et al. (J991);TGood-| 



child &^Bhiren ( 199?|ran 3|^ong, Kimerling &: Sahr ( 2i _^ 
introduced the Discrete Global Grid for GIS systems, an 
equal area variant of the same triangulation idea. The 
integration to relational databases is discussed in [Gray 
et al. (2004). 

The goal this paper is to provide the astronomy com- 
munity with a complete and consistent view of the cur- 
rent and much improved methodology built on a new 
fully functional spherical geometry framework, and de- 
scribe the implementations and interfaces used in several 
astronomy archives a nd tools today includ ing the Sloan 
Digital Sky Sur vey (JThakar et al.||2008|), the Hubble 
Legacy Archive (Greene et al. 2Q07),^the~Galax y Evolu- 
tion Explor er and the IN VU l^botprint Service ( ,Budavari 
et al.| 2007). In Section [2] we describe how to specify 



spherical shapes and manipulate them. Section [3] deals 
with the spherical geometry of the region representations 
and Section [4] discusses an efficient search method based 
on the Hierarchical Triangular Mesh. In Section [5] we 
provide details of our software solution including the im- 
plementation of the database routines and their usage. 
Section [6] summarizes the main results, along with cur- 
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rent and future applications in astronomy. 

2. SHAPES ON THE CELESTIAL SPHERE 

In cartography, maps are typically local projections, 
e.g., the pages of an atlas that just overlap so that they 
provide enough reference for navigating a road or a trail, 
and in general, moving from one projection to another 
can be quite difficult. In astronomy, the usage pattern is 
different and usually more global that warrants the use 
of a true spherical geometry. 

Spherical polygons are closed geometric figures over 
the sphere, formed by arcs of great circles. Generalized 
spherical polygons (GSPs) are similar, but their arcs can 
also be small circles. These are conceptually simple and 
yet versatile enough to represent most common shapes in 
astronomy. They easily describe circles, rectangles (even 
with curved sides) , and where they cannot precisely track 
a boundary, an accurate approximation is possible by a 
short series of arc segments. Not only do they conve- 
niently describe spherical shapes but they also provide a 
very compact representation. For example, the vertices 
and the equations of the edges (arcs) define the outline 
of a generalized spherical polygon, and its inside can be 
determined by the order of the vertices. One convention 
is to define the inside to be to the left as one traverses 
around the polygon. Either a small or a great circle can 
be defined as the intersection of the unit sphere with a 
3-D plane. This enables us to use another, dual repre- 
sentation, by using half-space contraints that define the 
interior surface area of these spherical circles, instead of 
their outline. If we embed the surface of the unit sphere 
in a 3-dimensional Eucledian space, we can use 3D di- 
rected planar (halfspace) constraints and their boolean 
combinations to select parts of the sphere that repre- 
sent various spherical shapes of this family describable 
by GSPs. Regions that cannot be represented this way 
would be the curves defined by intersections of higher 
order surfaces with the sphere, like quadrics. These two 
alternative descriptions are formally equivalent but have 
different properties that make them preferable for certain 
types of problems. Both have advantages and disadvan- 
tages but we do not have to choose one. In this section, 
we first elaborate on the surface or convex representa- 
tion in detail with special emphasis on its strengths, and 
point out how to obtain the outline algorithmically. 

2.1. Half spaces, Convexes and Regions 

Going from a two-dimensional representation of spher- 
ical shapes to a three-dimensional description provides 
a uniform framework with readily available geometrical 
concepts to build on. A halfspace is a directed plane that 
splits the 3-D space in two. In our context, it represents 
a spherical cap when intersected with the unit sphere. 
It is defined by a direction, i.e., a unit vector n, and a 
signed scalar offset c, measured along the normal vec- 
tor from the origin. Using a unit-sphere centered on the 
origin of a Cartesian coordinate system, the parameter c 
can take values between 1 (an infinitely small cap) and -1 
(the whole sky) . The value of corresponds to a special 
case and represents half the sky cut along a great circle. 
In general, c is the cosine of the angular radius of the 
cap. The caps with a negative c are called holes. An- 
other common shape in astronomy is a rectangle in the 



tangent plane defined by the geometry of the detector. 
Since a straight line in any tangent plane projects onto 
a great circle on the surface of the sphere, the rectangle 
is a convex formed by the intersection of four zero-offset 
halfspaces, whose normal vectors point in the direction 
of the neighboring vertices' cross products. 

An intersection of halfspace constraints represents a 
(possibly open) 3-D coni;ea; polyhedron, which in turn de- 
scribes a more complicated shape when it intersects the 
unit sphere. Despite their simplicity in 3-D, convexes can 
define a large variety of spherical shapes, e.g., rings, dia- 
monds or other polygons of "straight" or curved sides. In 
fact, they can even represent multiply connected shapes 
of arbitrarily large topological complexity. For example, 
the vertices and edges of a large enough cube centered on 
the sphere can define eight disjoint generalized spherical 
triangles as they poke through the surface. 

Nevertheless, convexes are constrained in 3-D and, 
hence are limited in what types of spherical shapes can 
describe. More general spherical regions can be defined 
as the unions of convexes. This hierarchy of halfspaces, 
convexes and regions and their negation (or difference) 
provides a complete algebra over these spherical regions, 
enabling extreme flexibility and efficiency. 

2.2. Point in a Region 

One of the most common tasks is the containment test 
to decide whether a point is inside a region. We start 
with the most basic elements of the region, the halfs- 
paces. A point (unit vector r) is inside a halfspace (n, c), 
if the dot product of the two vectors is greater than the 
offset, 

n r > c. (1) 

While this is straightforward to see, its computational 
simplicity is striking, and has serious consequences for 
the performance of any implementation using this for- 
malism. For example, the computation requires only 
three multiplications and one comparison for a circle and 
four times that for a spherical quadrangle, regardless of 
the size and actual shape. 

Testing a point against a convex simply consists of 
checking whether the point is inside all its halfspaces. 
If yes, the point is inside; otherwise outside. As a region 
is the union of its convexes, it contains a point if any of 
its convexes contains the point. 

2.3. Boolean Algebra of Regions 

Within this framework, the boolean algebra of regions 
maps very well onto basic set operations on the collec- 
tions of halfspaces and convexes. The set of regions is 
closed for the operations one routinely performs when 
deriving survey specific geometries. The same is not true 
for the convexes. 

The union of two or more regions is a region that in- 
cludes all the convexes, 

rW u i?(2) = CW U . . . U Ci^) U Cf ) U . . . U C(?) (2) 

= C\U...UCn+m = R (3) 

by definition. Also the intersection of two or more con- 
vexes is a convex that includes all their halfspaces, 

c(i) n c(2) = ff « n . . . n ff (1) n fff ^ n . . . n ff^^) (4) 
= Hin...nHr,+m = C (5) 
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Fig. 1. — The concave diamond shape is described by four half- 
spaces of negative offsets, whose arcs draw its outhne, plus a fifth 
constraint that separates the diamond from the remainder of the 
sphere outside the four holes. 



and, in turn, the intersection of regions is a region whose 
convexes are the pairwise intersections of the convexes, 
e.g., for two 



i?Wni?(2) = U(c«ncf) 



(6) 



It is straightforward to define the differences of halfs- 
paces, convexes, and regions, even if not as simple as the 
above operations. Let us first look at two halfspaces. We 
see that their difference is 



Hi\H2 = Hin H2 



(7) 



where H is the negate or inverse of H^ which is obtained 
by simply flipping the sign of its normal vector and the 
offset. Similarly, subtracting a halfspace from a convex is 
also simple as is the subtraction from a region. It might 
look logical to express the difference of two convexes as 
a region whose convexes are 



c(^ 



\C(^)=U(c^W\^f)=U(^^'^^^P) (8) 



but the constructed convexes would overlap, hence the 
following procedure is preferred to avoid the overlaps 



cW\c(2) = U cWn^Pfl^fc 



(9) 



Also we difference a region and a convex by subtract- 
ing the convex from all the convexes of the region. By 
substituting the all-sky coverage into C^ of the previous 
equation, we get the region of a negated convex 



C = [j\H,{^H, 



(10) 



/c=l 



The negate of a region is in turn the intersection of its 
inverted convexes, by a simple application of deMorgan's 
rules. From these formulas we can also see that it is not 
enough to stop at the level of the convexes, because even 
if the basic building blocks of a particular geometry are 
simple, subsequent operations quickly yield more com- 
plicated descriptions that can only be represented as a 
region. 



3. SPHERICAL GEOMETRY 

With this elegant formalism in hand, the practical chal- 
lenge is to derive irreducible representations of the re- 
sults of these operations, discarding empty regions and 
redundant constraints. In general, this can be a compute- 
intensive task but, most of the time it is very fast and 
done as a pre-processing step that is well- worth the effort. 
The description not only becomes more compact but the 
simpler form speeds up subsequent analyses. This re- 
gion simplification is the topic of this section, where one 
has to move beyond the basic 3-D concepts of the previ- 
ous paragraphs and solve the spherical geometry of the 
region. 

3.1. Patches and Simplified Convexes 

A region is the union of convexes, so one has to start by 
simplifying all of its convexes individually. For many of 
the subsequent tasks the frequency of intersections will 
depend on the radii of the caps, thus it is a good practice 
to sort the collection by increasing size. After making 
sure that the smallest cap is indeed finite (if infinitely 
small, the convex is empty, and is eliminated), we ex- 
amine the pairwise relations of the halfspaces. Based on 
the topology of the halfspaces, we can discard the ones 
that are the same as another, and those that fully con- 
tain other halfspaces. If we find halfspaces that are each 
other's inverse or simply disjoint, the convex is empty. 

Having done the trivial pruning of halfspaces, there 
might still be redundant constraints but one can only 
reject them by explicitely solving for the vertices and arcs 
of the convex. The circumference of a halfspace, a circle, 
would generally intersect other circles. We compute the 
roots (0, 1 or 2) for all possible pairs of circles, keeping 
track degenerate roots where multiple (> 2) halfspaces 
intersect. We now collect the roots on each circle and 
form arcs around the circle between the roots. Some of 
the arcs are invisible, i.e., outside the convex, these we 
can ignore, and can also prune degenerate ones, if any. 
At this point, one can start to form a chain (linked list) 
of the arcs by taking one randomly and looking to match 
its end point with the start point of a next arc. Then we 
repeat adding arcs until the starting point of the first arc 
is reached. This singly connected part enclosed by the 
arcs is called a patch. 

In general, a convex could have more than one patches, 
in which case there could be leftover arcs. With these 
we can repeat the previous procedure to form the re- 
maining of the patches. In some sense, the patches are 
the most basic and compact elements of the convexes. 
We derive a minimal enclosing circle for every patch and 
store them with the convex. The bounding circles facil- 
itate quick rejections in containment tests for convexes 
of many halfspaces, and enable faster collision or overlap 
detections between shapes. In the process we keep track 
of the halfspaces whose arcs are present in any of the 
patches, these are the halfspaces that we have to keep in 
the simplified representation. On top of these, we have to 
add those halfspaces that are needed to reject the roots 
outside the convex. This may happen in situations such 
as the diamond shape that is defined as the intersection 
of four holes. Since the surface of the sphere is a closed 
manifold, these four negative halfspaces could define two 
patches so another halfspace is needed to select the patch 
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Fig. 2. — The union of two overlapping rectangles (left) is turned into three disjoint convexes (middle) in the process of simplification. 
The outline (right) eliminates those parts of the arcs that are internal and cancel out. 



Fig. 3. — The subtraction algorithm is illustrated on two rectangles (left). The difference consists of four disjoint rectangles (middle), 
whose outline (right) has much fewer arcs than the patches. 



we want, see Figure [T] By the end of the algorithm, the 
convex is left with the minimum number of halfspaces 
that still describes the same shape. 

3.2. Region Simplification 

Naturally, the simplification of a region starts by pro- 
cessing its convexes one by one. If we know that the 
convexes are disjoint or not interested in the analytic 
area calculation, we are done. The following steps can 
not only provide a region description with disjoint con- 
vexes and precise areas but enable a potential perfor- 
mance boost from a simpler representation. 

Building on the bounding circles, first an approximate 
collision graph is calculated, where the links note which 
convex (a node in the graph) might overlap with others. 
It is possible that some convexes simply contain others, 
in which case the smaller ones are redundant, and thus 
quickly eliminated, simplifying the collision graph. 

Partial overlaps are more difficult to deal with, and 
here the region algebra proves (see Section 2.3) to be in- 
valuable. To guarantee disjoint convexes, one has to look 
for potential collisions between two convexes and derive 
new ones that do not intersect. In practice, this is simply 
the convex subtraction, where one keeps the larger con- 
vex and substitutes the smaller one with the difference 
of the two. The collision graph is updated to include the 
new convexes that are now disjoint from the larger one. 
Since the new convexes present a smaller area than their 



progenitor, they can only collide with those that were 
linked to the eliminated one. We repeat this procedure 
until the graph has no links left. We proceed by work- 
ing with the larger convexes to guarantee that those are 
not broken up into many small pieces. It is common for 
regions to have more convexes than the original descrip- 
tion after this step, but they will be now disjoint. During 
this simplification process the number of collisions often 
grows as for every eliminated convex we introduce one or 
more, whose collision links are inherited and get queued 
for analysis. 

It is worth noting that one can also eliminate convexes 
by stitching them. For example, two adjacent rectangles 
next to one another that share an arc and have neighbor- 
ing halfspaces that are identical, can be substituted by a 
single convex. Stitching is not only a cosmetic improve- 
ment but in certain situations can make a huge impact 
on the performance. One example is when working with 
pixels that can be merged. Heuristic simplificatio ns c an 
make use of this feature, as we will see in Section [3^ 



3.3. The Outline of a Region 

The dual halfspace-convex-region representation has 
many advantages as discussed before, but in certain sit- 
uations a third representation, the outline^ provides fur- 
ther new opportunities. One obvious application is visu- 
alization, when we would like to render regions projected 
on the sky. In Section [4] we will also look at how the out- 
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line can be used in advanced algorithms to accelerate 
searches for points in a region. We started by stating 
that the surface and shape descriptions are theoretically 
equivalent but in practice going from one to the other is 
not always straight forward. Next we discuss the outline 
and its derivation. Going the other way, from outline to 
surface representation is less convenient and not really 
necessary, when working primarily with halfspaces and 
convexes. 

The patches of a region are essentially the outlines of 
the convexes, but not necessarily the outline of the re- 
gion. For example, the outline of a region with a single 
convex that has only one patch consists of the patch's 
arcs. This is also true if the convex has multiple patches 
as patches of the same convex can never touch each other 
at more than a point. As soon as multiple convexes are 
present, adjacent ones will share arcs, at least partially. 
We can define a segment as part of a directed arc that 
has a start and an end point, and there are no additional 
vertex point along the arc between these two. The al- 
gorithm to create an outline would just have to remove 
those segments from all patches that "annihilate" each 
other, i.e. there are two directed segments which are 
identical in geometry but opposite in direction. The first 
step is to break up each arc into unique segments, which 
is performed by grouping the arcs by common circles, 
then ordering the circle, and breaking all arcs up into 
distinct segments. A possible approach to identify the 
"canceling" segments is to look along the common cir- 
cles in the collection of all patches and for every circle 
identify the places where two segments are opposite to 
one another, and hence cancel out. In this case both are 
removed. To preserve the relation to the original patches, 
which have precomputed bounding circles, we keep the 
structure of the outline similar to that of the region, but 
the arcs of the patches are replaced by smaller segments. 
Such representation is most advantagous but lacks the 
connectivity of the arcs, which requires the visualizer to 
draw the arcs independently lifting the pen. A conti- 
nous outline is chained into a loop of the correct order 
by checking the start- and endpoints. 

3.4. Heuristic Simplification with Igloo 

When none of the above simplification methods work 
well, we can use hybrid techniques where we combine 
the advantages of pixelization with the exact geometry. 
Let us imagine an large area survey that takes tens of 
thousands of pointed observations with a circular field- 
of-view, so that the circles fully overlap. While the edges 
of the survey are rippled and need many circles to rep- 
resent it accurately, the inside of the region is contigous 
and simple. The aforementioned simplification rules in- 
cluding the stitching will not be able to achieve much 
improvement, although there should be a simpler form. 
The previous algorithms would eventually also choke on 
a region with ~50,000 convexes where every one of the 
convexes overlaps with 12 other. This is similar to a good 
approximation for the sky coverage of the Faint Images 
of the Radio Sky at Twe nty-centimeters (FIRST; Becker, 
White fc Helfand|p95t survey. ' ' 

It one could break up the region into small pieces that, 
unlike small circles, can be stitched together neatly, then 
one could potentially define a small number of large con- 
vexes in the middle of the region, and eliminate the union 



of tens of thousands of caps. The algorithm is an ele- 
gant divide-and-conquer recursion. We need a pixeliza- 
tion that is a hierarchical subdivision of the surface of 
the sphere, where the pixels are not only nicely adjacent 
but also share many halfspace constraints that allow for 
extensive mer ging. Such pixeliz ation is achieved by the 

We start with a region 



Igloo scheme (Crittenden 



2000) 



and build a tree, e.g, usmg the (3:0:3) pixelization, where 
only those nodes are kept that collide with the region. 
Every Igloo pixel is a simple convex, aswe have defined 
this term, namely a generalized spherical triangle or a 
rectangle, hence the usual region operations can be di- 
rectly applied. The depth-first recursion is very efficient 
when one records for every node of the tree the convexes 
of the region that it overlaps with, so that on the deeper 
levels of the tree with more and more nodes, less and 
less convexes are to be tested per each node. We typi- 
cally specify the stopping criteria for building the tree by 
pixel size (maximum number of levels) or by the maxi- 
mum number of leaf nodes. Once the tree is ready, the 
geometry of every leaf is examined. We subtract the col- 
liding convexes of the region from each leaf one by one 
and calculate its area (see Section 3.5). 

If the pixel is completely covered, we take the convex 
of the pixel, otherwise derive its exact shape. The merg- 
ing of the pixels into larger and larger convexes is done 
along the tree using the full pixels only, and once the 
fragmentation of the region stops the pixels to grow, one 
can stitch the neighboring pixels, if they are on the same 
level. 

This procedure yields the exact region with alterna- 
tive internal convexes. Taking it one step further, we 
can also create an even simpler region by keeping the 
full pixels for nodes where the overlap area is large even 
if not completely full. For these nodes, we define another 
region called the mask that defines the overshoot area of 
the approximation. Typical footprints already have such 
masks to censor areas covered by bright stars, satellite 
trails, etc., and adding some more does not make the 
subsequent analyses more difficult, instead, it can signif- 
icantly simplify the description of the region. 



3.5. Area Calculation 

A patch is just a generalized spherical polygon 
bounded by small (and/or great) circles, whose arcs are 
ordered by design. The trick is the break a patch up into 
more regular pieces whose areas we can calculate. Let us 
pick a point arbitrarily on the sphere, e.g., at the center 
of mass of the vertices of a patch, and break up the poly- 
gon into spherical triangles such that one of the vertices 
of every triangle is this selected point and it contains an 
arc of the patch. Now every triangle has two great circle 
arcs and one of the original arcs. We follow the approach 
of Goodchild & Shiren (1992), by subdividing this shape 
into a triangle of great circle "arcs and the leftover shape, 
known as the semilune. The area of the former is given 
by the Girard formula that has several variants. Here we 
list the one that is the most robust against degeneracy 
and hence should be preferred for numerical calculations 
over equations that appear to be simpler algebraically: 



Ag = ^ tan ^ ^/z 



(11) 
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Fig. 4. — Part of the FIRST footprint illustrates the power of the heuristic simplification based on a hierarchy of Igloo pixels. The 
brute- force method (left) runs much slower and produces many more disjoint convexes than the novel technique (right) that fills the inside 
of the region with stitchable convexes and merges them well. 



with 

z = tan 



(I) 



. s—a\ fs—b\ ( s—c 
tan(^-jtan(— jtan( — 



(12) 



where s is half the circumference of the triangle with the 
sides a, 6, c in radians. 

The area of a semilune is also calculated analytically 
but it is more convoluted. For completenes s, here we 
print the f ormul as without further explanatioi jGoodchild| 
& Shiren|(fT992l). 



a — h cos d 



(13) 



where Q is the half angle of the small circle, i.e., the radius 
of the cap, and 



a = 2 arcsin (tan (arcsin r) / tan ( 
b = 2 arcsin(r / sin 0) 



(14) 
(15) 



with r being half the Eucledian distance between the end 
points of the arc. The situation is slightly complicated 
by the fact that certain patches are actually holes (for 
cos^ < 0), whose area is propagated with a negative sign 
to obtain the correct total sum for the region. 

4. THE HIERARCHICAL TRIANGULAR MESH 

Another pixelization of the sphere in combination with 
the region representation enables fast spatial searches for 
catalog entries of stars, galaxies and other astronomical 
sources that are within a spherical region on the sky. The 
idea is to apply a coarse filter to the entire dataset and 
to reject most of the sources that are outside the search 
region, and perform the computationally more expensive 
geometry test on a much smaller subset of candidates 
that already passed the filter. 

Our choice for such a filter is based the Hierarchical 
Triangular Mesh, also known as the HTM (Kunszt et al. 
[2000) . Next we discuss the properties and features of the 
HTM, then introduce the algorithms for creating efficient 
coarse filters for regions that map very well onto the in- 
dexing facilities of currently available relational database 
engines. 

4.1. Address of a Point 

We can paint the sphere with triangular pixels that 
we call trixels defined by the HTM. The top nodes are 



the 8 faces defined by an octahedron projected onto the 
sphere. The children of each node are obtained by sub- 
dividing the existing triangular nodes into 4 new trian- 
gles. The sub-triangles have the current corners and the 
current arc's midpoints as their corners. Finer detail is 
created as new levels are added by repeating the process 
for each triangle. In the limit, the recursive subdivision 
approaches the ideal sphere as in Figure [5] 

All the triangles at any level have a unique identifier 
or trixel ID. It is an integer number, that encodes the 
position of the trixel in the hierarchy and is composed 
through the following recursive algorithm. The level 
trixels are the faces of the octahedron. We name them 
with NO, Nl, N2, N3 and SO, SI, S2, S3 where N and S refer 
to the northern and southern hemispheres, respectively. 
In the recursion, each triangle has four offsprings that are 
named by appending the index 0, 1, 2 or 3 to the parent's 
name. Figure [6] illustrates the naming convention in the 
subdivision on a few examples. 

To keep things consistent, we introduce a straightfor- 
ward mapping between the name and the ID of a trixel. 
The ID is a 64-bit integer and is more compact than the 
human-readable name that can be up to 25 bytes long. 
First we assign the bits 11 to N and 10 to S and convert 
the trixel's number betwen and 3 to binary (00 and 
11) and append it to the previous bits. We repeat un- 
til the desired level is reached. For example, the trixel 
named S2320 will convert to binary 1010111000 or dec- 
imal 696. The longer the name of a trixel, the deeper 
it is in the hierarchy. For practical purposes, we stop 
at level 20, approximately corresponding to a positional 
accuracy of about 0.3". This special level-20 trixel ID is 
called the HtmlD in our system. A 64-bit integer can hold 
a trixel number to level 30, although the standard lEEE- 
ffoat double precision breaks down after level 25 where 
the positional accuracy would be about one hundredth 
of an arcsecond. A property of the trixel ID numbers 
is that the descendants of a trixel have IDs that form a 
consecutive list of numbers. For example, the 4 children 
of S2320, namely the set of {S23200, S23201, S23202, 
S23203}, form the range of numbers 2784-2787. The 
level 20 offsprings of the same trixel form the HtmID 
range 11957188952064-11974368821247. This is impor- 
tant, because we can represent any level trixel with either 
a single number, or with a pair of low-high HtmID val- 
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Fig. 5. — The recursive subdivision of the octahedron, illustrated above, is at the heart of the Hierarchical Triangular Mesh, also known 
as the HTM. 




Fig. 6. — The naming of the hierarchical trixels provides unique 
identifiers. The ensemble of all trixels at a given level can be 
thought of as a space-filling curve on the surface of the sphere. 

ues. The consequence of using the latter representation 
is that regions with variable size trixels can be expressed 
uniformly, and because the numbering provides partial 
coherence of the HtmID numbers. 

Since trixels partition the sphere, any location is inside 
exaclty one trixel, so a level 20 trixel ID, or the HtmID 
is a fairly accurate approximation of the position. This 
property is exploited in our spatial search algorithm de- 
tailed in the subsequent paragraphs. 

4.2. Approximate Region Covers 

Simply put, a cover is a set of trixels that fully covers a 
region. It is an approximation of the shape by the union 
of a set of spherical triangles. Given a region, the algo- 
rithm starts with the eight level-0 trixels that make up 
the entire globe. The S2 in Figure [6] is one of these eight 
trixels that are initially marked as unprocessed. In the 
recursion, all unprocessed trixels are analyzed and get 
marked with one of three possible tags. The mner trixels 
are fully inside the region, the reject ones are completely 
outside and partial trixels are on the outline. Dealing 
with inner and reject trixels is easy. Inner trixels are 
saved for output, and the rejects are discarded from fur- 
ther consideration. A partial trixel is subdivided into 
four smaller trixels which are placed at the back of the 
unprocessed list. Eventually all trixels are tagged to the 
desired level of detail. 

Often it is very useful to have an approximation of 
the inside of the region that does not touch the outline. 
Sources in the inner cover are guaranteed to be inside the 
region, hence do not require extra geometry tests. This 
inner cover is the union of all the inner trixels, while 



the aforementioned outer cover is the union of the inner 
and partial trixels. In fact, it is possible to obtain both 
sets at the same time without much extra processing. 
As to where to stop, there is no universal optimum, and 
the answer will depend on the actual dataset and region, 
as well as the implementation of the search engine and 
even its hardware configuration. Fortunately, sensible 
heuristics exist and the performance of the searches are 
considerably better than the naive implementation for 
any reasonable cover shapes. 

4.3. Searching with HTM 

Let us now examine how spatial searching is performed 
using an HTM cover. Assume that every source of our 
dataset has a pre-computed HtmID that constrains its 
location on the celestial sphere to within a particular 
level-20 trixel. The outside cover of the region can be 
represented as a set of HtmID intervals. If the sources 
are ordered by their HtmID, fetching sources in these in- 
tervals is extremely fast. If the dataset resides on disk, 
which is the case for any massive live astronomical cat- 
alog, getting sources in an interval consists of reading 
sequentially from the hard drive. At the end of every in- 
terval, the disk head is raised and re-positioned to the 
beginning of the next interval. This seek-time is the 
source of the penalty one would pay for a very accu- 
rate cover that is represented by many short intervals. 
Often a couple of dozen HTM ranges provide accurate 
representations that select a small enough candidate list 
with which modern CPUs can keep up with processing 
the exact geometry calculations. In general, any custom 
stopping criterion can be utilized, e.g., based on elapsed 
time or resolution size. One particularly interesting pos- 
sibility is to monitor the area of the inner and outer cov- 
ers and stop at a desired limit on their ratio. In fact, the 
area ratio can be approximated by the ratio of the num- 
ber trixels in the two covers. The areas of trixels on the 
same level vary less than about 40% but the collections 
of (random) trixels would average out this variance to a 
more precise estimation of the area ratio. 

5. SOFTWARE PACKAGES 

Our design of the software implementation was driven 
by several requirements. It needed to be architecture 
and operating-system independent that also integrates 
well with the relational database technology, which is 
at the core of most modern astronomy archives today. 
We chose to build our solution in the .NET Frame- 
worlQ programming model that satisfies our develop- 
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Fig. 7. — A sample region from the SDSS geometry illustrates the various spherical concepts: {from left to right) (1) Circles of all halfspace 
contraints of the region. (2) The outline as used in the HTM algorithm. (3) The inner (filled gray) and outer covers (open triangles) of the 
region using larger and (4) smaller trixels. 



merit, maintainance, portability and performance re- 
quirements. Such managed code runs in a virtual ma- 
chine called the Common Language Runtime, or CLR for 
short. In addition to Microsoft's CLR implementation 
of the Common Language Infrastructure (CLI), there is 
also an open-source cross-platform runtime by the Mono 
Projeclj^ The .NET Framework is not only OS inde- 
pendent but also allows for development in several pro- 
gramming languages and the integration of projects in a 
mixture of languages. Using the C# programming lan- 
guage, we built a set of class libraries that depend on 
each other, so that applications in any one of the sup- 
ported languages can choose to include the appropriate 
modules selectively. 

5.1. The Spherical Library 

The basic module or assembly contains the routines 
to deal with the spherical geometry. Generic container 
classes are used to store the collection of halfspaces in 
a convex and the collection of convexes that make up a 
region. Thus managing a shape is as simple as working 
with lists. Here we show the C# listings of a trivial 
example with a convex of two halfspaces to illustrate the 
simplicity of the coding. First, we define the centers of 
the caps using J2000 (R.A., Dec.) coordinates, 

Cartesian pi = new Cartesian (180, 0); 
Cartesian p2 = new Cartesian(181 , 0); 

then set the radius of the caps and create the halfspaces 
using the centers 

double theta = Math. PI / 180; // 1 degree radii 
Halfspace hi = new Half space(pl, Math. Cos (theta) ) ; 
Halfspace h2 = new Half space(p2, Math. Cos (theta) ) ; 

The convex is a collection of halfspaces that are added 
one by one after which we invoke the simplification of the 
description that also derives the arcs and patches of the 
shape along with its analytic area in square degrees. 

Convex c = new Convex (); 

c.Add(hl); 

c.Add(h2); 

c.SimplifyO ; 

Console . Out . WriteLine (c . Area) ; 

Similary a region is created by adding convexes to its 
collection. 
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Region r = new RegionO ; 

r.Add(cl); 

r.Add(c2); 

r. Simplify ; 

Console . Out . WriteLine (r . Area) ; 

Naturally, the number of convexes in a region and half- 
spaces in a convex are not limited to two (and can also 
be one), they are only bound by system memory and 
computational power for processing. 

Boolean operations on the shapes are implemented to 
perform the computations in place whereever possible. 
For example, the union of regions or the intersection of 
convexes can be done within the instance on which the 
method is called, but the difference of two convexes or 
regions is implemented to return the resulting region. On 
top of the straightforward translations of the algorithms 
in Section sec:alg, most operations have a smart version 
for simplified shapes. One of the advantages of using 
these methods is the speedup for complicated shapes, 
where the precomputed bounding circles reduce the com- 
putational costs. The other benefit is potentially even 
greater. E.g, the union of simplified regions can be done 
based on the assumption that the convexes of the re- 
gions are already disjoint, and hence, check for collisions 
among the mixed pairs only. In code, the snipets 

r.Union(r2); // regions may not be simplified 
r .SimplifyO ; // simplification from scratch 
Console . Out . WriteLine (r . Area) ; 

and 

r .SmartUnion(r2) ; // both simplified 
Console . Out . WriteLine (r . Area) ; 

are conceptually identical but the latter can take signifi- 
canly less time as it can assume the arguments to be in 
the canonical form. 

The application programming interface (API) supports 
many more advanced ways to create and manipulate re- 
gions of arbitrary complexity, the working code snipets 
above are here to serve as guide. 

5.2. Numerical Imprecisions 

Numerical stability of the implementation of the 
spherical algorithms is crucial and has not been eas- 
ily achieved. Computations with floating-point numbers 
make errors that can accumulate in a series of opera- 
tions. These uncertainties can result in erroneous deter- 
mination of topological relations of points and planes, 
convexes and regions. One possible solution is not to 
use floating-point arithmetics. Software packages exist 
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that work with rational numbers, and represent them as 
a pair of integers: the numerator and the denominator. 
In this setting, the values are always precise but there 
is an efficiency penalty. In spherical geometry, there is 
an even more severe problem with this approach. The 
set of rational numbers is not closed for operations that 
one has to routinely perform, e.g., the square root of a 
rational number can be irrational. 

We use IEEE Standard 754 double-precision floating- 
point numbers in our implementations that most modern 
CPUs can efficiently process, and we carefully analyze 
the code for numerical stability and rounding problems. 
It is a surprisingly hard task. For decades, linear algebra 
routines, e.g., those in LAPACK, have been strengthened 
and optimized by hand because no generic solution ex- 
ists; only best practices. Classic examples include the 
robust solution to the quadratic equation, or evaluating 
the values of o? — lP' and log(l + x). To illustrate the im- 
portance, here we describe real-life problems that make 
the spherical geometry computations more difficult than 
expected, and explain our solutions briefly. The details of 
error propagation in floating-point arithmetics is beyond 
the scope of this wr iting. Rather, we refer the interested 
reader to Goldberg (1991). 

Previously we said^that deciding whether a point is 
inside a halfspace, the most basic containment test dis- 
cussed in Section [2J involves calculating the dot product 
of the point's unit vector and the direction of the plane 
and comparing the result to the halfspace's offset, the 
cosine of the cap's radius. The numerical imprecision on 
the result translates to larger errors in the radius because 
the cosine function is quadratic at values near zero, hence 
using the sine can be more advantageous. Even then, 
when the point is very close to the plane, it is essentially 
impossible to decide which side it is on or whether it is 
contained in the plane. We circumvent the problem by 
not trying to answer in or out but allow for an unde- 
termined value within a margin derived from the limi- 
tations of the representation of floating-point numbers. 
This value is deep in the core of routines establishing the 
spatial topological relations of the shapes. 

Another, more subtle, algorithmic problem is the so- 
lution of the intersection of two planes, and the deriva- 
tion of the intersecting points of two circles on th e unit- 
sphere. As discussed in detail by Priamos| ( |1992| ), an el- 
egant robust solution (out of many possible derivations) 
is based on optimizing for the errors made in the com- 
putation of a point that the line crosses. The idea is to 
pick one of the x-y, y-z and x-z planes that is most per- 
pendicular to the direction of the line, and solve for the 
point in that plane. This way one of the coordinates is 
readily fixed to be 0, and we only have to solve a 2-D 
linear equation for the other coordinates. 

Halfspaces and vertices of spherical polygons become 
degenerate very frequently as a result of routine opera- 
tions while building up the geometric description of ob- 
servations. Without controlled accuracy these degenera- 
cies could not be spotted and caught to derive correct 
representations or the regions. 

5.3. Modules for HTM 

Classes and routines of the HTM implementation are 
divided into two basic categories. The creation of the 
hierarchy of triangles is a module of its own. The tree is 



usually not created for its extremely large size but com- 
puted on the fly. Once the algorithm is fixed, the hier- 
archical trixels exists without the actual tree in memory 
or disk. The methods provide the recursion and the ba- 
sic geometrical description of the pixels. For example, a 
single call yields the HtmID, 

Cartesian p = new Cartesian(180,0) ; // Point on the sky 
Int64 htmID = Trixel.CartesianToHid20(c) ; // and its ID 

and another the trixel's geometry, 

Cartesian vl, v2, v3; // Vertices of the trixel 
Trixel.ToTriangleChtmlD, out vl, out v2, out v3) ; 

On top of the core module is a set of classes that deal 
with the regions and their topological relations to the 
trixels. Our efficient implementation of the HTM makes 
use of the internal structure of the region, and the de- 
rived properties of its convexes and patches as well as the 
outline. In the API, all the complexity is hidden behind 
simple 

List<Int64> trixels = Cover .HidList (region) ; 
Console . Out . WriteLine (trixels . Count) ; 

Alternatively one can explicitly instantiate a cover ob- 
ject, and investigate the properties during processing 

Cover k = new Cover (region) ; 

k.RunO; // Default processing and stopping 

// Call k.StepO for more control 
List<Int64> inner = k. GetTrixels (Markup. Inner) ; 
List<Int64> outer = k. GetTrixels (Markup. Outer) ; 
List<Int64> parti = k. GetTrixels (Markup. Partial) ; 

Similarly the intervals are retrieved by a single method 
call and print in the following example. 

List<Int64Pair> ranges = Cover .HidRange (region) ; 
foreach (Int64Pair p in ranges) 
Console . WriteLine (p) 

As shown in the examples above the HTM and Spheri- 
cal Library classes and routines work together seemlessly 
by design. They leverage all the information available to 
perform the operations as fast as possible, while keep- 
ing the usage patterns simple. Under these high-level 
routines, powerful methods provide easy customization 
of the code for other type of problems and applications. 
One such example is the NVO's Footprint Service that 
uses high-resolution HTM ranges to look for overlapping 
footprint, or regions that contain a given point. 

5.4. Region in a String 

Our basic internal string representation of the regions 
follows the simple structure of the collections. We enu- 
merate the convexes and all their halfspaces 

REGION 

CONVEX CARTESIAN xl yl zl cl 

CARTESIAN xN yN zN cN 
CONVEX . . . 

with arbitrary whitespace and linefeed characters. The 
cartesian coordinates are the usual transformations of 
the J2000 (a, 5) by equations 



x = cos5 cos a 
y = cos5 sin Of 
z = sin J 



(16) 

(17) 
(18) 
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On top of this simple description, the interfaces also sup- 
port more advanced concepts via region parser based on 
our grammar in the Backus-Naur Form (BNF). The use 
of these constructs are often more straighforward than 
spelling out the (x, y, z) coordinates of the normal vec- 
tors. Here we show the case for a couple of simple con- 
vexes often used by astronomers. The region describes 
the union of a circle with a radius of 60' around the center 
at (a, S) = (180°, 0°) in the J2000 coordinate system and 
a great-circle polygon specified by its ordered vertecies 
given by the angles. 

REGION 

POLY J2000 180 182 182 2 180 2 
CIRCLE J2000 180 60 

The polygon can be built up by any number of vertices 
(greater than two) and the region can have any combina- 
tion of convex constructs. Another useful feature is the 
convex hull of a point set that is specified by the keyword 
CHULL. To specify cartesian coordinates, one can replace 
the J2000 keyword with CARTERSIAN and enumerate the 
components of the unit vector instead of the anglular 
coordinates. 

5.5. SQL Routines 

One of the most attractive advantages of developing 
in the .NET programming model is the elegant integra- 
tion with SQL Server since the 2005 Version. The run- 
time is actually hosted inside the engine, which allows for 
the customization of the database to satisfy the astron- 
omy specific requirements. The custom assemblies can 
be loaded to be part of the database along with the cat- 
alog data, and User-Defined Functions (UDFs) can wrap 
the functionality of the managed code that are invoked 
efficiently from SQL at query time. This way custom 
programs can run inside the database right there where 
the data are, and perform analyses without moving the 
bits on the network or even outside the SQL engine. 

Our harness for the spherical implementation is 
schema-independent by design. This means that the 
same SQL routines can be present and be used in any 
astronomy science archive regardless of the layout of the 
database or the content of the tables. In fact, it is cur- 
rently in use in a wide variety of services, including the 
Sloan Digital Sky Survey, the Galaxy Evolution Explorer 
and the Hubble Legacy Archive, being integrated with 
the Spitzer and Chandra servers, among others. 

The regions are serialized into a compact binary format 
that contains both the halfsp ace- convex- region represen- 
tation and the patches along with their minimal enclosing 
circles. The simplest way to create a region is by using 
the internal specification language that can describe any 
region but the operations are also supported when start- 
ing with basic building blocks. For example, the union 
of a 60' radius circle and the spherical rectangle can be 
coded as follows. 

DECLARE @s VARCHAR(MAX) , (9r VARBINARY(MAX) , 
(9z VARCHAR(MAX), (9u VARBINARY(MAX) 

SELECT (9s = 'REGION CIRCLE J2000 180 60' , 

(9z = 'POLY J2000 180 182 182 2 180 2', 

Or = sph.fSimplifyString((9s) , 

(9u = sph.f Union (Or, sph.fSimplifyString((9z)) 

SELECT sph.fGetArea((9r) , SELECT sph.f GetArea((9u) 

GO 

— 3.14151290574491 6.35572804450646 

In SQL, the binary blobs of arbitrary size (up to 2GB) are 



represented as VARBINARY(MAX), and here we use vari- 
ables of that type. Naturally, one can also create tables 
with VARBINARY columns to save the resulting regions 
within SQL Server. We note that previous version of the 
SQL harness, e.g., currently deployed in the SDSS Cat- 
alog Archive Server, do not use the separate sph schema 
but instead keep the UDFs in the default dbo using an 
Sph prefix, e.g., in dbo.f SphGetArea(@r). 

The HTM routines in SQL provide high-performance 
spatial searches in combination with the builtin indexing 
mechanism. Scalar-valued UDFs can compute the ad- 
dress of a point at the default level of 20. Here we list a 
simple example that updates a table called PhotoObj to 
set the HtmlD column for all rows based on their J2000 
coordinates 

UPDATE PhotoObj SET HtmID = dbo.f HtmEq(RA,Dec) 

Using spherical regions to search for sources is also very 
straightforward and very efficient. In the following snip- 
pet we fetch only the rows that are in the HTM cover, 
hence are probably contained in the region: 

WITH Cover AS 



( 



) 



SELECT * FROM dbo . f HtmCoverRegion 

('REGION CIRCLE J2000 180 10') 



SELECT o.ObjID 

FROM PhotoObj AS o INNER JOIN Cover AS c 

ON o. HtmID BETWEEN c.HtmlDStart AND c.HtmlDEnd 
GO 

The UDF returns a table of the HTM ranges of the ap- 
proximation of the specified shape that overshoots, and 
the join returns instantly all the possible rows. In a real- 
life search, the WHERE clause of the query would have a 
separate containment test with the region geometry. The 
sole purpose of the HTM cover is to fetch the good can- 
didates only from the disk, and it does this very fast. 

5.6. The Astronomical Data Query Language 

As part of the current standardization efforts of the 
International Virtual Observatory Alliancaj(IVOA), the 
Space-Time Coordinate metadata (STC) provides an al- 
ternative to accurately describing shapes on the celes- 
tial sphere. An STC region has both XML (STC-X) 
and string (STC-S) serializations that are well mapped 
onto our data structures, and translators are being imple- 
mented to support them. The STC-X parser is deployed 
and being tested as part of the the National Virtual Ob- 
servatory's (NVOF]) Footprint ServicJ^ and the STC-S 
capabilities shall oe part of future releases of our soft- 
ware packages and the service. The plain string repre- 
sentation STC-S is very straightforward by default. The 
union of a set of convexes looks like 



Union J2000 
( 



Convex 



Convex 



10 

-10 

10 



0.1 

-0.5 

0.0 



The simplicity of the representation comes from the de- 
faults that are substituted automatically for the missing 



^ Click to visit h ttp : //www . ivoa . n et 
^ Click to visit http : //us- vo . orp^ 



^ Click to visit http://voservices.net/footprint 
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STC elements but it is also possible to be explicit about 
every detail and even have different coordinate systems 
for the components, e.g., in 



Union 

( 

Polygon FK5 J2000 
Circle FK4 B1950 

) 



180 
179 



182 
2 



182 2 180 2 



In addition to the union, STC sports a large set of fea- 
tures that also include intersection, difference and nega- 
tion of any of the shapes. 

The IVOA also working toward a standardized search 
language called the Astronomical Data Query Language 
(ADQL), which is an IVOA Recommendation as of this 
writing. ADQL is an extended subset of the ANSI SQL- 
92 standard, which adds geometrical constraints via sim- 
plified CIS-like functions (e.g., circle) and the full sup- 
port of aforementioned STC representation. Building on 
our legacy SQL routines that currently work with the 
internal string representation, we are now creating new 
versions that input STC-S, and are designing an efficient 
implemention that supports the full ADQL. 

6. SUMMMARY 

We explored the properties of generalized spherical 
polygons. We found the convex representation to be 
very compact and efficient. The boolean region opera- 
tions are straightforward within this framework but in- 
evitably create convoluted and redundant descriptions. 
The simplification involves solving the spherical geome- 



try of the region. In the process, we summarized how to 
analytically calculate the areas and create the outlines of 
the regions. 

Pixelization schemes also significantly benefit from the 
better representation. An optimal set of HTM triangles 
can be computed to cover entirely a region, or just to 
approximate the inside, in much less time than before. 
We introduced a heuristic hybrid solutions for simplifica- 
tions based on the Igloo pixelization hierarchy that can 
tackle massive geometries where the brute-force method 
fails to deliver. 

Our portable implementation in the .NET program- 
ming model is available for developers on the projects 
website. Its design enables quick development of new 
applications and a clean and easy integration with the 
SQL Server database engine that hosts several science 
archives today. The provided working examples illus- 
trate common search patterns in C# and SQL. Future 
works include more algorithmic optimizations and full 
support of the IVOA's STC and ADQL standards. 
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